Matching Top-k Answers of Twig Patterns in Probabilistic XML

نویسندگان

  • Bo Ning
  • Chengfei Liu
  • Jeffrey Xu Yu
  • Guoren Wang
  • Jianxin Li
چکیده

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. The top-k matching of a twig pattern against probabilistic XML data is essential. Some classical twig pattern algorithms can be adjusted to process the probabilistic XML. However, as far as finding answers of the top-k probabilities is concerned, the existing algorithms suffer in performance, because many unnecessary intermediate path results, with small probabilities, need to be processed. To cope with this problem, we propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TWIX: Approximate and Exact Twig Structure and Content Matching over XML Document Collections using Binary Labeling

XML queries specify predicates on the content and the structure of the elements of tree-structured XML documents. Hence, discovering the occurrences of twig (tree structure) query patterns is a core operation for XML query processing. Prior works have typically applied top-down decomposition of the twig patterns into (i) binary (parent-child or ancestor-descendant) relationships, or (ii) path e...

متن کامل

MARS: A Matching and Ranking System for XML Content and Structure Retrieval

Structural queries specify complex predicates on the content and the structure of the elements of tree-structured XML documents. Recent works have typically applied top-down decomposition of the twig patterns into (i) parent-child or ancestordescendant relationships, or (ii) path expression queries, and then followed by a join operation to reconstruct matched twig patterns. This demonstration s...

متن کامل

Matching Twigs in Probabilistic XML

Evaluation of twig queries over probabilistic XML is investigated. Projection is allowed and, in particular, a query may be Boolean. It is shown that for a well-known model of probabilistic XML, the evaluation of twigs with projection is tractable under data complexity (whereas in other probabilistic data models, projection is intractable). Under queryand-data complexity, the problem becomes in...

متن کامل

Twig Pattern Matching Algorithms for XML

The emergence of XML promised significant advances in B2B integration. This is because users can store or transmit structure data using this highly flexible open standard. An effective well-formed XML document structure helps convert data into useful information that can be processed quickly and efficiently. From this point there is need for efficient processing of queries on XML data in XML da...

متن کامل

Twig Patterns: From XML Trees to Graphs

Existing approaches for querying XML (e.g., XPath and twig patterns) assume that the data form a tree. Often, however, XML documents have a graph structure, due to ID references. The common way of adapting known techniques to XML graphs is straightforward, but may result in a huge number of results, where only a small portion of them has valuable information. We propose two mechanisms. Filterin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010